Ancient DNA from Protohistoric Period Cambodia indicates that South Asians admixed with local populations as early as 1st–3rd centuries CE

Indian cultural influence is remarkable in present-day Mainland Southeast Asia (MSEA), and it may have stimulated early state formation in the region. Various present-day populations in MSEA harbor a low level of South Asian ancestry, but previous studies failed to detect such ancestry in any ancient individual from MSEA. In this study, we discovered a substantial level of South Asian admixture (ca. 40–50%) in a Protohistoric individual from the Vat Komnou cemetery at the Angkor Borei site in Cambodia. The location and direct radiocarbon dating result on the human bone (95% confidence interval is 78–234 calCE) indicate that this individual lived during the early period of Funan, one of the earliest states in MSEA, which shows that the South Asian gene flow to Cambodia started about a millennium earlier than indicated by previous published results of genetic dating relying on present-day populations. Plausible proxies for the South Asian ancestry source in this individual are present-day populations in Southern India, and the individual shares more genetic drift with present-day Cambodians than with most present-day East and Southeast Asian populations.

www.nature.com/scientificreports/ Thai [8][9][10] . Using methods based on autosomal haplotypes, this genetic admixture was dated to around the 14 th c. CE (1194 CE-1502 CE) in a Khmer group from Cambodia 11 ). In a later study, the admixture date for the Khmer group from Cambodia was confirmed (12th-13th cc. CE), and admixture date estimates for a wider range of present-day SEA groups were obtained: they lie between the 16th c. CE for Bamar and the 4th cc. CE for Giarai (Jarai) from Vietnam 9,10 . Previous ancient DNA studies, although of very limited extent due to poor ancient DNA preservation in tropical climate, did not detect South Asian ancestry in any ancient MSEA individuals 3,4 .
Here we report a finding of substantial South Asian ancestry in a Protohistoric Period male child from the Vat Komnou cemetery at the Angkor Borei site on the western edge of Cambodia's Mekong Delta. The walled and moated Angkor Borei site was first occupied in the middle of the first millennium BCE and is one of the earliest dated urban sites in the Mekong Delta in either Cambodia or Vietnam 12 . In addition to brick architectural monuments, associated moats, and ponds, the Vat Komnou mortuary assemblage includes human burials, beads, ceramics, multiple pig skulls, and other faunal remains 13,14 . The Vat Komnou cemetery is one of the largest archaeological skeletal samples analyzed to date from Cambodia.
Genetic data for the male child from the Vat Komnou cemetery (skeletal code AB M-40) were first reported by Lipson et al. 4 : 0.047 × coverage on the 1240K SNP panel used for enrichment of human ancient DNA 15,16 , and we increased the coverage to 0.061 × and generated variant calls at 64,103 1240K sites (as compared to 54,221 sites in Lipson et al. 4 ). A direct radiocarbon date on the human bone (95% confidence interval is 78-234 calCE) was also obtained for the individual AB M-40 by Lipson et al. 4 .

Genetic overview of ancient Southeast Asians. The location of the ancient individual from Cambodia
(AB M-40 or I1680) is illustrated in Fig. 1. We explored genetic ancestry of this individual and other ancient individuals from Southeast Asia for whom published genetic data of acceptable quality were available (i.e., pseudo-haploid genotypes are available for more than 50,000 1240K sites). We computed principal components (PCs) using present-day populations from Europe, from East, Southeast, South, and Central Asia, the Andaman Islands, and Siberia, and projected 28 ancient individuals (from McColl et al. 3 , Lipson et al. 4 , and this study) onto these PCs. The PC1 vs. PC2 plot (Fig. 2, Suppl. Fig. S1) reveals three clusters of individuals: East and Southeast Asian (ESEA), European (EUR), and Andamanese Negrito. South Asian (SAS) populations form a cline between the European and Andamanese Negrito clusters, with populations from Pakistan and Northern India lying closer to the European cluster, in line with previous studies on the population genetics of South Asia [17][18][19] . The positions of populations speaking languages of the Munda branch of the Austroasiatic language family, such as Birhor and Kharia, deviate from the South Asian cline, and this is not unexpected since they were shown to harbor Southeast Asian admixture 18,20 . Kusunda, a group from Nepal, and Riang, a Sino-Tibetan-speaking group from India, fall close to the East and Southeast Asian cluster. These populations harbor a relatively high proportion of an ancestry component maximized in ESEA populations according to our ADMIXTURE analysis (Fig. 3). Central Asian and Siberian populations are distributed along PC1 between the SAS-EUR cline and the ESEA cluster, in line with many previous studies, such as Jeong et al. 21 . Most ancient individuals dated to 2500 BCE-1950 CE from Southeast Asia fall within the ESEA cluster, except for the individual AB M-40/I1680 from the Vat   22 analysis also indicates that this individual stands out among Southeast Asians since his genome harbors a high proportion of both "blue" and "grey" ancestry components, maximized in European and Andamanese populations, respectively. Both components are also prominent in South Asian populations (Fig. 3). Statistics  24 ], all other ESEA groups; AB M-40/I1680, presentday Cambodians) are all positive, with most absolute Z-scores higher than 2 (Suppl. Fig. S4), which indicates attraction between present-day Kinh (from Mallick et al. 24 ) and the Protohistoric individual AB M-40/I1680. However, when we replaced Kinh from Mallick et al. 24 with a much larger Kinh group from the 1000 Genomes Project 25 , the f 4 -statistics showed a mix of positive and negative values (Suppl. Fig. S4). The dataset-dependent attraction between Kinh and the Protohistoric individual AB M-40/I1680 indicated that the question about their relationship cannot be settled at this point.
To sum up, these PCA and ADMIXTURE results (and f 4 -statistics) suggest that substantial South Asian admixture is present in the individual AB M-40/I1680, which sets him apart from 25 other Southeast Asians for whom genetic data of acceptable quality was reported in the literature 3,4 .     8,18,20,28,29 . The result is consistent across combinations of ESEA and SAS surrogates (Fig. 5, Suppl. Fig. S5). We formally tested for South Asian admixture in the ancient individual AB M-40/I1680 from Cambodia (along with other published ancient Southeast Asian individuals) using a qpAdm 30 protocol with a "rotating" set of reference or "right" populations 31 . We first performed pairwise qpWave tests to check if the target ancient individual is cladal (genetically continuous) with a single reference population (see "Materials and methods" for details). None of these pairwise qpWave models were plausible (Suppl . Table S1). Next, we tested all possible pairs of ancestry sources (66 pairs) from the set of reference populations (see "Materials and methods"), and found that the genome of the Protohistoric Cambodian individual AB M-40/I1680 can be modelled as a two-way mixture of East Asian and South Asian populations (Fig. 6, Suppl. Table S1). Among plausible models (those with p-value is higher than 0.05 and with inferred admixture proportions ± 2 standard errors between 0 and 1 for all ancestry components) for the Protohistoric Cambodian individual AB M-40/I1680, Ami is the only East Asian surrogate, and Irula, Mala, and Vellalar are the only non-East Asian surrogates that result in plausible models (Fig. 6, Suppl. Table S1). Remarkably, all these surrogates are from Southern India (Fig. 1), and the fraction of  The plot illustrates all plausible qpAdm models inferred using the rotating strategy (see details in "Materials and methods"). All the plausible models consist of two surrogates: (1) an East and Southeast Asian surrogate (Ami, on the right-hand side of each bar) and (2)  www.nature.com/scientificreports/ Southern Indian ancestry was estimated at 42-49% across all plausible qpAdm models (Fig. 6, Suppl. Table S1). The fact that qpAdm models including European, Middle Eastern, Caucasian, or even Northern Indian ancestry sources are rejected suggests that contamination with modern DNA is an unlikely explanation for our results. Moreover, the libraries passed routine ancient DNA authenticity checks such as damage rate at a terminal read nucleotide: a damage rate below 0.03 is considered problematic 4 . See Suppl. Table S2 for detailed statistics for the merged data and for each library. The genetic makeup of some ancient individuals from Laos (La727), Malaysia (Ma554, Ma555), Thailand (Th531), and Vietnam (I859, I2947, I0626, Vt880) can be interpreted in the light of our qpAdm models as a twoway admixture of East Asians and a deeply diverged East Eurasian lineage (represented by Onge and Papuans in the set of reference populations) (Suppl. Table S1). The results for all the other ancient individuals are ambiguous, as plausible models for some individuals are diverse (i.e., East Asians + various proxies such as West Asians, Caucasians, Europeans, South Asians, the deep East Eurasian lineage). The fact that qpAdm failed to reject multiple models with various West Eurasian proxies could be caused by modern DNA contamination or by lack of data. For few individuals, we were unable to find any plausible qpAdm model (Suppl . Table S1).

South Asian ancestry in ancient Southeast
The Protohistoric Cambodian individual AB M-40/I1680 harbors a considerably higher proportion of South Asian ancestry as compared to present-day Cambodians (42-49% vs 9%), as estimated in a previous study 10 . However, the qpAdm reference populations used here and in the previous study 10 are different. To avoid an unforeseen bias which may cause the excess of South Asian ancestry in the Protohistoric Cambodian individual AB M-40/I1680, we estimated the proportion of South Asian admixture with the same set of surrogates and qpAdm "right" populations for the ancient individual and present-day Cambodians. Since no plausible model was found for present-day Cambodians when we used the reference population set from the analysis above, we replaced Ami with Dusun and performed qpAdm analysis with the new set of reference populations (see "Materials and methods"). The results indicate that South Asian ancestry in the ancient individual is indeed much higher than in present-day Cambodians: 37-44% and 12-15%, respectively (Suppl . Table S1).

Discussion
We report new archaeogenomic data from a radiocarbon-dated individual AB M-40/I1680 from Cambodia (95% confidence interval for the date is 78-234 calCE) that serves as a key data point for dating the beginning of the South Asian gene flow into Southeast Asia. This individual was previously analyzed by Lipson et al. 4 , but we generated additional data to increase the sensitivity of the analyses in the current study. South Asians were not considered as a potential ancestry source for MSEA groups in the original study, which explains the discrepancy between our results and those by Lipson et al. 4 . For instance, PCA in Fig. 1 of that study 4 lacks any West Eurasian groups, and another PCA in Fig. S1 of that study 4 lacks any South Asian groups except for the Andamanese. South Asian admixture is present in various present-day MSEA populations, especially those heavily influenced by Indian culture [8][9][10] . However, previous aDNA studies did not detect South Asian admixture in ancient individuals across MSEA, although due to the immense technical difficulties genomic data (of any quality) were generated for just 21 individuals from Cambodia, Laos, Malaysia, Thailand, and Vietnam dated to the 5th c. BCE and later 3,4 . As suggested by archaeological evidence 6,7 , we expect to find traces of interactions with South Asia starting in the 4th c. BCE.
An earlier study estimated the proportion of South Asian ancestry in present-day Cambodians at ca. 9% 10 . Notably, the proportion of South Asian ancestry in the ancient individual from Cambodia (point estimates ranging from 42 to 49% across all plausible qpAdm models) is considerably higher than in the present-day majority population of the country and in any present-day MSEA population investigated in the literature (see an overview in Changmai et al. 10 ). In this study, we also demonstrate that the ancient individual from Cambodia harbors an approximately three times higher proportion of South Asian ancestry as compared to the present-day Cambodians, as estimated by qpAdm with the same set of reference populations (Suppl . Table S1). We were also not able to detect a clear signal of South Asian ancestry in any other 25 published ancient individuals from 2500 BCE to 1950 CE across Southeast Asia. Since we are dealing with a single individual only, it is hard to estimate the intensity of the gene flow and to extrapolate this estimate to the general Protohistoric population in the territory of Cambodia.
The location and date (1st-3rd c. CE) of the individual AB M-40/I1680 fit the early period of Funan, an early state in the territories of Cambodia and Vietnam. Chinese written sources documented that the Funan dynasty was established by an Indian Brahmin named Kaundinya and a local princess named Soma 34 . Archaeological evidence from glass and stone beads recovered from the Mekong Delta and peninsular Thailand 35 and archaeobotanical remains 36 suggests the possibility of multi-ethnic residence in areas of Protohistoric MSEA whose www.nature.com/scientificreports/ populations engaged in maritime trade (e.g., Bellina 37 ). Collectively, these data suggest some level of Indian cultural influence in the Mekong Delta in the 1st-3rd c. CE. The only plausible South Asian genetic sources for the individual from Funan, inferred by the qpAdm method, are populations from Southern India. Even though the results suggest that South Indian populations are by far the most plausible surrogates for this individual, we caution that the actual ancient sources possibly had a genetic profile different to the present-day South Asian populations we used for the qpAdm analysis. Consensus now holds that the early first-millennium Mekong Delta kingdoms associated with the Chinesenamed Funan were the predecessor states for the Angkorian empire, but what language was spoken before the seventh century CE (when the earliest inscription in the Delta appeared, at Angkor Borei 38 ) remains a mystery. The term "Funan" is based on the modern Chinese pronunciation. Whether the local pronunciation was "biunâm", resembling the old Khmer word "bnam", meaning "mountain, " remains an unresolved question 5 . Continuities in architectural style, imagery, and settlement forms from the Protohistoric through later Angkorian period suggest that some residents of Funan spoke old Khmer. Outgroup f 3 -statistics (Fig. 4) support this point as they show that Cambodians and Kinh (Vietnamese) are present-day groups sharing the highest amount of genetic drift with the ancient individual AB M-40/I1680 from Funan. When we compared the ancient individual AB M-40/I1680 and present-day Cambodians, we detected a signal of additional gene flow from Atayal-related (probably Austronesian-speaking) groups in present-day Cambodians. The signal, previously found by a different method in an earlier study 10 , could be a result of long-term interactions between Cambodians and Cham, an Austronesian-speaking group and the biggest ethnic minority group in present-day Cambodia 1 , documented throughout their history 5 . Kinh (data from Mallick et al. 24 ) share more genetic drift with the ancient individual AB M-40/I1680 than present-day Cambodians (Suppl. Fig. S4), which is consistent with outgroup f 3 -statistic results (Fig. 4). However, the extra attraction was not detected in Kinh from the 1000 Genomes Project 25 (Suppl. Fig. S4). We note that the Kinh group from Mallick et al. 24  Changmai et al. 10 estimated the date of South Asian admixture events in various present-day MSEA populations between the 4th and 16th cc. CE. The inferred dates of South Asian admixture in present-day Cambodians and in a Khmer group from Thailand are 771-808 BP and 1218-1291 BP (95% confidence intervals), respectively 10 , but the date of the ancient individual AB M-40/I1680 is much older (1716-1872 calBP, 95% confidence interval). Indeed, true dates of admixture events, especially its starting date, do not necessarily fall into the range of admixture dates estimated from similar present-day populations from the same or proximate locations. Our results suggest that some South Asians migrated to MSEA and intermarried with local people before or at the early stage of state formation. These South Asians may have influenced the expansion of Indian culture and the establishment of Indian-style states (also known as Indianized states 5 ). aDNA data from MSEA are scarce, so further data collection is necessary to trace the incorporation of South Asian ancestry into the populations of this region.

Materials and methods
Archaeological context, ancient sample preparation, and dataset compilation.  4 . The individual was sorted from commingled remains of at least five different individuals. The skeletal remains for AB M-40 included cranial and postcranial elements of a male child whose age was estimated to be 5-7 years at the time of death based on clavicular diaphyseal length, the degree of epiphyseal fusion, and dental calcification and eruption. The cranial elements include the right and left parietals, occipital, frontal, right zygomatic, inferior maxillary bones, left temporal, and the petrous portion of the right temporal. The mandible was missing the right ramus. A left clavicular shaft and a thoracic vertebra represent the postcranial remains for this individual. DNA was obtained from the petrous portion of the right temporal bone (for details on DNA extraction and library preparation see Lipson et al. 4 ).
In the present study, we generated two single-stranded sequencing libraries for this individual (according to a protocol by Gansauge et al. 39 ) in addition to 7 analyzed by Lipson et al. 4 , which increased the coverage for autosomal targets on the 1240K SNP panel 15,16 from 0.047 × to 0.061 × and increased the number of usable SNPs from 54,221 to 64,103 1240K sites (see details on all the sequencing libraries in Suppl. Table S2). We merged the new data with published genome-wide data for ancient individuals from Southeast Asia 3,4 and worldwide present-day populations 24,27,[40][41][42] using PLINK v.1.90b6.10 (https:// www. cog-genom ics. org/ plink/). We kept only autosomal SNPs from the 1240K panel 15,16 and filtered out ancient individuals who have data for fewer than 50,000 autosomal SNPs. Details of the dataset composition are available in Suppl. www.nature.com/scientificreports/ Principal component analysis (PCA). We performed PCA using smartpca 43 version 16,000 from the EIGENSOFT package (https:// github. com/ DReic hLab/ EIG). We computed principal components for presentday populations from Central, East, Southeast, and South Asia, Andamanese Islands, Siberia, and Europe. We then projected the ancient Southeast Asian individuals onto the principal components using options "lsqproject: YES" and "shrinkmode: YES".
We first performed pairwise qpWave modeling using the "qpwave" function from the R package "ADMIX-TOOLS2" 45 . We considered a pair composed of the target individual (AB M-40/I1680 or one of 25 other ancient MSEA individuals) and one of the 12 reference populations (all except Mbuti) as "left populations". We assigned the remaining 12 reference populations as "right populations". We used a cut-off p-value of 0.05 and tested both "allsnps = TRUE" (using all the possible SNPs for each f 4 -statistic) and "allsnps = FALSE" settings (using only SNPs with no missing data at the group level across all the "left" and "right" populations). Overall, we tested 168 group pairs with qpWave per one target group per one "allsnps" setting.
As none of the qpWave models for all the ancient MSEA targets was fitting the data, we further tested 2-way admixture models using qpAdm with a "rotating" strategy 31 . All possible combinations of two reference populations (except for Mbuti) acted as surrogates for the target. The remaining 11 reference populations were assigned as "right populations". We defined a model that meets two following criteria as a "plausible model": (1) p-value is higher than 0.05; (2) inferred admixture proportions ± 2 standard errors lie between 0 and 1 for all ancestry components. We tested qpAdm models using the "qpadm" function from the R package "ADMIXTOOLS2". We tested both "allsnps = TRUE" and "allsnps = FALSE" settings. In total, 924 models per target per one "allsnps" setting were tested.
For comparing South Asian ancestry proportions in the Protohistoric Cambodian individual AB M-40/I1680 and in present-day Cambodians, we replaced Ami with Dusun in the set of reference populations. We modeled the ancient individual and present-day Cambodians as a 2-way admixture of Dusun and one of the following South Asian populations: Irula, Mala, and Vellalar (plausible South Asian sources for the Protohistoric Cambodian individual in the previous qpAdm analysis). The set of right populations consists of Mbuti, Japanese, Yi, Onge, Papuan, Pima, Karitiana, Druze, Adygei, Sardinian, and Orcadian.

Data availability
The newly reported aligned DNA reads for the Protohistoric period individual from the Vat Komnou cemetery are publicly available through the European Nucleotide Archive (project accession no. PRJEB54975).